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Summary. The quantum state of a light beam can be represented as an infinite dimensional 
density matrix or equivalently as a density on the plane called the Wigner function. We de- 
scribe quantum tomography as an inverse statistical problem in which the state is the unknown 
parameter and the data is given by results of measurements performed on identical quantum 
systems. We present consistency results for Pattern Function Projection Estimators as well as 
for Sieve Maximum Likelihood Estimators for both the density matrix of the quantum state and 
its Wigner function. Finally we illustrate via simulated data the performance of the estimators. 
An EM algorithm is proposed for practical implementation. There remain many open problems, 
e.g. rates of convergence, adaptation, studying other estimators, etc., and a main purpose of 
the paper is to bring these to the attention of the statistical community. 

1. Introduction 

It took more than eighty years from its discovery till it was possible to experimentally determine and 
visualize the most fundamental object in quantum mechanics, the wave function. The forward route 
from quantum state to probability distribution of measurement results has been the basic stuff of 
quantum mechanics textbooks for decennia. That the corresponding mathematical inverse problem 
had a solution, provided (speaking metaphorically) that the quantum state has been probed from a 
sufficiently rich set of directions, had also been known for many years. However it was only with 
Smithey et al. (1993), that it became feasible to actually carry out the corresponding measurements 
on one particular quantum system — in that case, the state of one mode of electromagnetic radiation 
(a pulse of laser light at a given frequency). Experimentalists have used the technique to establish 
that they have succeeded in creating non-classical forms of laser light such as squeezed light and 
Schrodinger cats. The experimental technique we are referring to here is called quantum homodyne 
tomography: the word homodyne referring to a comparison between the light being measured with 
a reference light beam at the same frequency. We will explain the word tomography in a moment. 

The quantum state can be represented mathematically in many different but equivalent ways, all 
of them linear transformations on one another. One favorite is as the Wigner function W: a real 
function of two variables, integrating to plus one over the whole plane, but not necessarily nonneg- 
ative. It can be thought of as a "generalized joint probability density" of the electric and magnetic 
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fields, q and p. However one cannot measure both fields at the same time and in quantum mechan- 
ics it makes no sense to talk about the values of both electric and magnetic fields simultaneously. 
It does, however, make sense to talk about the value of any linear combination of the two fields, 
say cos((j))^ + sin((j))/?. And one way to think about the statistical problem is as follows: the un- 
known parameter is a joint probability density W of two variables Q and P. The data consists of 
independent samples from the distribution of (X,<t>) = (cos(4>)g + sin(4>)P,4>), where 4> is chosen 
independently of (Q,P), and uniformly in the interval [0,7t]. Write down the mathematical model 
expressing the joint density of (X,<t>) in terms of that of (Q,P). Now just allow that latter joint 
density, W, to take negative as well as positive values (subject to certain restrictions which we will 
mention later). And that is the statistical problem of this paper. 

This is indeed a classical tomography problem: we take observations from all possible one- 
dimensional projections of a two-dimensional density. The non-classical feature is that though all 
these one-dimensional projections are indeed bona-fide probability densities, the underlying two- 
dimensional "joint density" need not itself be a bona-fide joint probability density, but can have 
small patches of "negative probability". 

Though the parameter to be estimated may look strange from some points of view, it is math- 
ematically very nice from others. For instance, one can also represent it by a matrix of (a kind 
of) Fourier coefficients: one speaks then of the "density matrix" p. This is an infinite dimensional 
matrix of complex numbers, but it is a positive and selfadjoint matrix with trace one. The diago- 
nal elements are real numbers summing up to one, and forming the probability distribution of the 
number of photons found in the light beam (if one could do that measurement). Conversely, any 
such matrix p corresponds to a physically possible Wigner function W, so we have here a con- 
cise mathematical characterization of precisely which "generalized joint probability densities" can 
occur. 

The initial reconstructions were done by borrowing analytic techniques from classical tomog- 
raphy - the data was binned and smoothed, the inverse Radon transform carried out, followed by 
some Fourier transformations. At each of a number of steps, there are numerical discretization and 
truncation errors. The histogram of the data will not lie in the range of the forward transformation 
(from quantum state to density of the data). Thus the result of blindly applying an inverse will not 
be a bona-fide Wigner function or density matrix. Moreover the various numerical approximations 
all involve arbitrary choices of smoothing, binning or truncation parameters. Consequently the final 
picture can look just how the experimenter would like it to look and there is no way to statistically 
evaluate the reliability of the result. On the other hand the various numerical approximations tend 
to destroy the interesting "quantum" features the experimenter is looking for, so this method lost in 
popularity after the initial enthousiasm. 

So far there has been little attention paid to this problem by statisticians, although on the one 
hand it is an important statistical problem coming up in modern physics, and on the other hand it is 
"just" a classical nonparametric statistical inverse problem. The unknown parameter is some object 
p, or if you prefer W, lying in an infinite dimensional linear space (the space of density matrices, or 
the space of Wigner functions; these are just two concrete representations in which the experimenter 
has particular interest). The data has a probability distribution which is a linear transform of the 
parameter. Considered as an analytical problem, we have an ill-posed inverse problem, but one 
which has a lot of beautiful mathematical structure and about which a lot is known (for instance, 
close connection to the Radon transform). Moreover it has features in common with nonparametric 
missing data problems (the projections from bivariate to univariate, for instance, and there are more 
connections we will mention later) and with nonparametric density and regression estimation. Thus 
we think that the time is ripe for this problem to be "cracked" by mathematical and computational 
statisticians. In this paper we will present some first steps in that direction. 
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Our main theoretical results are consistency theorems for two estimators. Both estimators are 
based on approximating the infinite dimensional parameter p by a finite dimensional parameter, 
in fact, thinking of p as an infinite dimensional matrix, we simply truncate it to an N x N matrix 
where the truncation level N will be allowed to grow with the number of observations n. The first 
estimator employs some analytical inverse formulas expressing the elements of p as mean values of 
certain functions, called pattern functions, of the observations (X, <£>) . Simply replace the theoretical 
means by empirical averages and one has unbiased estimators of the elements of p, with moreover 
finite variance. If one applies this technique without truncation the estimate of the matrix p as a 
whole will typically not satisfy the nonnegativity constraints. The resulting estimator will not be 
consistent either, with respect to natural distance measures. But provided the truncation level grows 
with n slowly enough, the overall estimator will be consistent. The effect of truncating the density 
matrix p is to project on the subspace generated by the first elements of the corresponding basis, we 
shall call it the Pattern Function Projection estimator (PFP). 

The second estimator we study is sieve maximum likelihood (SML) based on the same trun- 
cation to a finite dimensional problem. The truncation level N has to depend on sample size n in 
order to balance bias and variance. We prove consistency of the SML estimator under an appropri- 
ate choice of N(n) by applying a general theorem of van de Geer (2000). To verify the conditions 
we need to bound certain metric entropy integrals (with bracketing) which express the size of the 
statistical model under study. 

This turns out to be feasible, and indeed to have an elegant solution, by exploiting features of the 
mapping from parameters (density matrices) to distributions of the data. Various distances between 
probability distributions possess analogues as distances between density matrices, the mapping from 
parameter to data turns out to be a contraction, so we can bound metric entropies for the statistical 
model for the data with quantum metric entropies for the class of density matrices. And the latter 
can be calculated quite conveniently. 

Our results form just a first attempt at studying the statistical properties of estimators which are 
already being used by experimental physicists, but they show that the basic problem is both rich in 
interesting features and tractable to analysis. The main results so far are consistency theorems for 
PFP and SML estimators, of both the density matrix and the Wigner function. These results depend 
on an assumption of the rate at which a truncated density matrix approximates the true one. It seems 
that the assumption is satisfied for the kinds of states which are met with in practice. However, 
further work is needed here to describe in physically interpretable terms, when the estimators work. 
Secondly, we need to obtain rates of consistency and to further optimize the construction of the 
estimator. Thirdly, one should explore the properties of penalized maximum likelihood. This will 
make the truncation level data driven. Fourthly, one should try to make the estimators adaptive 
with respect to the smoothness of the parameter. We largely restrict attention to an ideal case of 
the problem where there is no further noise in the measurements. In practice, the observations have 
added to them Gaussian disturbances of known variance. There are some indications that when the 
variance is larger than a threshold of 1 /2, reconstruction becomes qualitatively much more difficult. 
This needs to be researched from the point of view of optimal rates of convergence. 

We also only considered one particular though quite convenient way of sieving the model, i.e., 
one particular class of finite dimensional approximations. There are many other possibilities and 
some of them might allow easier analysis and easier computation. For instance, instead of truncating 
the matrix p in a given basis, one could truncate in an arbitrary basis, so that the finite dimensional 
approximations would correspond to specifying N arbitrary state vectors and a probability distri- 
bution over these "pure states". Now the problem has become a missing data problem, where the 
"full data" would assign to each observation also the label of the pure state from which it came. 
In the full data problem we need to reconstruct not a matrix but a set of vectors, together with an 
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ordinary probability distribution over the set, so the "full data" problem is statistically speaking a 
much easier problem that the missing data problem. We shall use a version of this to derive an 
Expectation-Maximization algorithm (EM) for the practical implementation of the SML estimator, 
see Section 5. One could imagine that Bayesian reconstruction methods could also exploit this 
structure. 

The analogy with density estimation could suggest new statistical approaches here. Finally, it is 
most important to add to the estimated parameter, estimates of its accuracy. This is absolutely vital 
for applications, but so far no valid approach is available. 

In Section 2 we introduce first, very briefly, the statistical problems we are concerned with. We 
then give a short review of the basic notions of quantum mechanics which are needed in this paper. 
Concepts such as observables, states, measurements and quantum state tomography are explained 
by using finite dimensional complex matrices. At the end of the section we show how to generalize 
this to the infinite dimensional case and describe the experimental set-up of Quantum Homodyne 
Detection pointing out the relation with computerized tomography. 

In Section 3 we present results on consistency of density matrix estimators: projection estimator 
based on pattern functions, and sieve maximum likelihood estimator. The last subsection extends 
the results of the previous ones to estimating the Wigner function. 

Section 4 deals with the detection losses occurring in experiments due in part to the inefficiency 
of the detectors. This adds a deconvolution problem on the top of our tomography estimation. 

Section 5 shows experimental results. We illustrate the behavior of the studied estimators and 
propose some practical tools for the implementation — EM algorithm. The last section finishes with 
some concluding remarks to the whole paper and open problems. The main purpose of the paper 
is to bring the attention of the statistical community to these problem; thus, some proofs are just 
sketched. They fully appear in a complementary paper, Guja (2004). 

2. Physical background 

Our statistical problem is to reconstruct the quantum state of light by using data obtained from mea- 
suring identical pulses of light through a technique called Quantum Homodyne Detection (QHD). 
In particular we will estimate the quantum state in two different representations or parameteriza- 
tions: the density matrix and the Wigner function. The physics behind this statistical problem is 
presented in subsection 2.2 which serves as introduction to basic notions of quantum theory, and 
subsection 2.3 which describes the model of quantum homodyne detection from quantum optics. 
The relations between the different parameters of the problem are summarized in the diagram at the 
end of subsection 2.3 followed by Table 1 containing some examples of quantum states. For the 
reader who is not interested in the quantum physics background we state the statistical problem in 
the next subsection and we will return to it in Section 3. 

2. 1 . Summary of statistical problem 

We observe (Xi,<£>i), . . . , (X„, <£>„), i.i.d. random variables with values in M x [0,7i] and distribution 
Pp depending on the unknown parameter p which is an infinite dimensional matrix p = [Pjjc]jjc=o,...,o° 
such that Trp = 1 (trace one) and p > (positive definite). The probability density of P p is 

pp(jc,4>) = - £ PmV*(*)^W«"'°'"**. (!) 
71 y,*=0 

where the functions {x|/ n } to be specified later, form a orthonormal basis of the space of complex 
square integrable functions on M. Because p is positive definite and has trace 1, this is a probability 
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density: real, nonnegative, integrates to 1. The data (X( : ,<t>p) comes from independent QHD mea- 
surements on identically prepared pulses of light whose properties or state are completely encoded 
in the matrix p called a density matrix. We will consider the problem of estimating p from a given 
sample. 

Previous attempts by physicists to estimate the density matrix p have focused mainly on the 
estimation of the individual matrix elements without considering the accuracy of the estimated den- 
sity matrix with respect to natural distances of the underlying parameter space. In Section 3 we 
will present consistency results in the space of density matrices w.r.t. L\ and L2-norms using two 
different types of estimators, namely projection and sieve maximum likelihood estimators. 

An alternative representation of the quantum state is through the Wigner function W p : M 2 — > R 
whose estimation is close to a classical computer tomography problem namely, Positron Emission 
Tomography (PET), Vardi et al. (1985). In PET one would like to estimate a probability density / 
on R 2 from i.i.d. observations (X\, <£>i ),..., (X„, <£>„), with probability density equal to the Radon 
transform of /: 

/oo 
f(x cos (j) + 1 sin (j) , x sin (j) — t cos §)dt . 
-oo 

Although the Wigner function is in general not positive, its Radon transform is always a probability 
density, in fact ^ [Wp] (x,§) = p p (x,§). As the Wigner function Wp is in one-to-one correspondence 
with the density matrix p, our state reconstruction problem can be stated as to estimate the Wigner 
function Wp. This is an ill posed inverse problem as seen from the formula for the inverse of the 
Radon transform 

W p (q,p) = ^2 / J Pp{x,§)K(qcos$ + psinty-x)dxd$, (2) 

where 

K(x)= l -J^\^xp{i^)d^ (3) 

makes sense only as a generalized function. To correct this one usually makes a cut-off in the range 
of the above integral and gets a well behaved kernel function K c (x) = | f' t \^\ exp(;'^x)c/^. Then the 
tomographic estimator of Wp is the average sampled kernel 

- 1 " 

WcAl'P) = ITT' ££c(?cos3v+/>sin3v-X^). 

ZJj\i Yl £ j 

For consistency one needs to let the 'bandwidth' h = l/c depend on the sample size n and h n — > 
as n — > oo at an appropriate rate. 

In this paper we will not follow this approach, which will be treated separately in future work. 
Instead, we use a plug-in type estimator based on the property 

oo 

Wp(q,p) = £ p kJ W kJ (q,p), (4) 
kj=0 

where Wkj's are known functions and we replace p by its above mentioned estimators. We shall 
prove consistency of the proposed estimators of the Wigner function w. r. t. L2 and supremum norms 
in the corresponding space. 
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2.2. Quantum systems and measurements 

This subsection serves as a short introduction to the basic notions of quantum mechanics which will 
be needed in this paper. For simplicity we will deal first with finite dimensional quantum systems 
and leave the infinite dimensional case for the next subsection. For further details on quantum 
statistical inference we refer to the review Barndorff-Nielsen et al. (2003) and the classic textbooks 
Helstrom (1976) and Holevo (1982). 

In classical mechanics the state of macroscopic systems like billiard balls, pendulums or stellar 
systems is described by points on a manifold or "phase space", each of the point's coordinates 
corresponding to an attribute which we can measure such as position and momentum. Therefore the 
functions on the phase space are called observables. When there exists uncertainty about the exact 
point in the phase space, or we deal with a statistical ensemble, the state is modelled by a probability 
distribution on the phase space, and the observables become random variables. 

Quantum mechanics also deals with observables such as position and momentum of a particle, 
spin of an electron, number of photons in a cavity, but breaks from classical mechanics in that 
these are no longer represented by functions on a phase space but by Hermitian matrices, that is, 
complex valued matrices which are invariant under transposition followed by complex conjugation. 
For example, the components in different directions of the spin of an electron are certain 2x2 
complex Hermitian matrices o X7 O y7 G z . 

Any <f -dimensional complex Hermitian matrix X can be diagonalized by changing the standard 
basis of C d to another orthonormal basis {e\,. . . ,e</} such that Xe; = x,'e; for ;' = l,...,d, with 
Xi e E. The vectors e, and numbers x, are called eigenvectors and respectively eigenvalues of X. 
With respect to the new basis we can write 



X\ 
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. 





X2 
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The physical interpretation of the eigenvalues is that when measuring the observable X we obtain 
(randomly) one of the values x, according to a probability distribution depending on the state of the 
system before measurement and on the observable X. This probability measure is degenerate if and 
only if the system before measurement was prepared in a special state called an eigenstate of X. We 
represent such a state mathematically by the projection P, onto the one dimensional space generated 
by the vector e,- in C d . Given a probability distribution {pi,. . . ,pd} over the finite set {x\,... 
we describe a statistical ensemble in which a proportion pi of systems is prepared in the state P, by 
the convex combination p = £ ( /><P ( -. The expected value of the random result X when measuring 
the observable X for this particular state is equal to piXi which can be written shortly 

E p (X):=Tr(pX). (5) 

Similarly, the probability distribution can be recovered as 

Pi = Tr(pP,) (6) 

thanks to the orthogonality property Tr(P,P 7 ) = 8 i; . 

Now, let Y be a different observable and suppose that Y does not commute with X, that is 
XY ^ YX, then the two observables cannot be diagonalized in the same basis, their eigenvectors 
are different. Consequently, states which are mixtures of eigenvectors of X typically will not be 
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mixtures of eigenvectors of Y and vice-versa. This leads to an expanded formulation of the notion 
of state in quantum mechanics independent of any basis associated to a particular observable, and 
the recipe for calculating expectations and distributions of measurement results. 

Any preparation procedure results in an statistical ensemble, or state, which is described math- 
ematically by a matrix p with the following properties 

(a) p > (positive definite matrix), 

(b) Tr(p) = 1 (normalization). 

In physics p is called a density matrix, and is for a quantum mechanical system an analogue of a 
probability density. Notice that the special state /5,-P; defined above is a particular case of density 
matrix, since it is a mixture of eigenstates of the observable X. The density matrices of dimension 
d form a convex set Sd, whose extremals are the pure or vector states, represented by orthogonal 
projections P(v|/) onto one dimensional spaces spanned by arbitrary vectors \|/ e C d . Any state 
can be represented as a mixture of pure states which are not necessarily eigenstates of a particular 
observable. 

When measuring an observable, for example X, of a quantum system prepared in the state p 
we obtain a random result X e {x\,...,Xd} with probability distribution given by equation (6), 
expectation as in equation (5), and characteristic function 

G(t) := E p (exp(/fX)) = Tr(pexp(;fX)). (7) 

In order to avoid confusion we stress the important difference between X which is a matrix and X 

which is a real-valued random variable. More concretely, if we write p in the basis of eigenvectors 

(X) 

of X then we obtain the map M from states to probability distributions P p over results {x\, . . . ,Xd} 



( pll 

P21 


Pl2 • 
P22 • 
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, , p(X) _ 


( Pl1 \ 

P22 
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■ Pdd ) 
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Notice that Pp is indeed a probability distribution as a consequence of the defining properties 
of states, and it does not contain information about the off-diagonal elements of p, meaning that 
measuring only the observable X is not enough to identify the unknown state. Roughly speaking, 
as dim(5j) = d 2 — 1 = (d — \){d + 1), one has to measure on many identical systems each one 
of a number of d + 1 mutually non-commuting observables in order to have a one-to-one map 
between states and probability distributions of results. The probing of identically prepared quantum 
systems from different 'angles' in order to reconstruct their state is broadly named quantum state 
tomography in the physics literature. 

Let us suppose that we have at our disposal n systems identically prepared in an unknown state 
p € Sd, and for each of the systems we can measure one of the fixed observables X(l), . . . ,X(d + l). 
We write the observables in diagonal form 

X(i) = Y J x l . a P l . a (8) 

a=\ 

where x^ a eigenvalues and P,\ fl eigenstates. We will perform a randomized experiment, i.e. for each 
system we will choose the observable to be measured by randomly selecting its index according to 
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a probability distribution P* over {l,...,d + l}. The results of the measurement on the k th system 
are the pair = (X^,^) where 4>i, . . . ,4>„ are i.i.d. with probability distribution P^ andX^ is the 
result of measuring the observable X(4>j.) whose conditional distribution is given by 

P M (X k = x Ua |* t = = Tr(pP iia ) (9) 

The statistical problem is now to estimate the parameter p from the data Y\ , Y2, . . . , Y n . In the next 
subsection we will describe quantum homodyne tomography as an analogue of this problem for 
infinite dimensional systems. 

2.3. Quantum homodyne tomography 

Although correct and sufficient when describing certain quantum properties such as the spin of a 
particle, the model presented above needs to be enlarged in order to cope with quantum systems 
with 'continuous variables' which will be central in our statistical problem. This technical point 
can be summarized as follows: we replace C d by an infinite dimensional complex Hilbert space 
H , the Hermitian matrices becoming selfadjoint operators acting on H . The spectral theorem tells 
us that selfadjoint operators can be 'diagonalized' in the spirit of (2.2) but the spectrum (the set of 
'eigenvalues') can have a more complicated structure, for example it can be continuous as we will 
see below. The density matrices are positive selfadjoint operators p such that Tr(p) = 1 and can be 
regarded as infinite dimensional matrices with elements p^j := (\|/;,p\|/jt) for a given orthonormal 
basis {\|/i,\|/2, . . . } in H . 

The central example of a system with continuous variables in this paper is the quantum particle. 
Its basic observables position and momentum, are two unbounded selfadjoint operators Q and P 
respectively, acting on L 2 (M), the space of square integrable complex valued functions on R 

(QVi)(x) =*Vi(x), 

(P ¥2)W = -^, 
ax 

for l|fi,\|/2 arbitrary functions. The operators satisfy Heisenberg's commutation relations QP 
PQ = ;1 which implies that they cannot be measured simultaneously. The problem of (separately) 
measuring such observables has been elusive until ten years ago when pioneering experiments in 
quantum optics by Smithey et al. (1993), led to a powerful measurement technique called quantum 
homodyne detection. This technique is the basis of a continuous analogue of the measurement 
scheme presented at the end of the previous subsection where d + 1 observables were measured in 
the case of a <f -dimensional quantum system. 

The quantum system to be measured is a beam of light with a fixed frequency whose observables 
are the electric and magnetic field amplitudes which satisfy commutation relations identical to those 
characterizing the quantum particle, with which they will be identified from now on. Their linear 
combinations X$ = cos(j)Q + sin(j)P are called quadratures, and homodyne detection is about mea- 
suring the quadratures for all phases £ [0, Jt]. The experimental setup shown in Figure 1 contains 
an additional laser called a local oscillator (LO) of high intensity \z\ 3> 1 and relative phase (j) with 
respect to the beam in the unknown state p. The two beams are combined through a fifty-fifty beam 
splitter, and the two emerging beams are then measured by two photon detectors. A simple quantum 
optics computation (see Leonhardt (1997)) shows that in the limit of big LO intensity the difference 
of the measurement results (countings) of the two detectors rescaled by the LO intensity X = 
has the probability distribution corresponding to the measurement of the quadrature X<^. The result 
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local 
oscilator 

z = \z\e<* 



Fig. 1. Quantum Homodyne Tomography measurement system 



X takes values in R and its probability distribution P p (-|(j)) has a density /? p (x|(j)) and characteristic 
function (see equation (7)) 

Gp,x^(0=Tr(pexp(^)). (10) 

The phase (j) can be controlled by the experimenter by adjusting a parameter of the local oscillator. 
We assume that he chooses it randomly uniformly distributed over the interval [0, jc]. Then the joint 
probability distribution P p for the pair consisting in measurement result and phase Y :— (X,<t>) has 
density p p (x,§) equal to ^/?p(^|(j)) with respect to the measure dx x d§ on E x [0,7t]. An attractive 
feature of the homodyne detection scheme is the invertibility of the map T that associates P p to p, 
making it possible to asymptotically infer the unknown parameter p from the i.i.d. results Y\,...,Y„ 
of homodyne measurements on n systems prepared in the state p. 

We will see now why this state estimation method is called quantum homodyne tomography 
by drawing a parallel with computerized tomography used in the hospitals. In quantum optics 
it is common to represent the state of a quantum system by a certain function on M 2 called the 
Wigner function W p (q,p) which is much like a joint probability density for Q and P, for instance its 
marginals are the probability densities for measuring Q and respectively P. The Wigner function of 
the state p is defined by demanding that its Fourier transform f2 with respect to both variables has 
the following property 



W p (u,v) := 72[Wp](«,v) =Tr(pexp(-iKQ-ivP)). 



(11) 



We see from this equation that if Q and P were commuting operators then W(q,p) would indeed be 
the joint probability density of outcomes of their measurement. As the two observables cannot be 
measured simultaneously, we cannot speak of a joint density, in fact the Wigner function need not 
be positive, but many interesting features of the quantum state can be visualized in this way. Let 
(w,v) = (f cos(j),f sin0), then 



W(M,v) = Ji[p(-^)](0=Tr(pexp(-^X (> )) 



(12) 



where the Fourier transform T \ in the last term is with respect to the first variable, keeping (j) fixed. 



10 L.M. Artilesetal. 

The equations (11) and (12) are well known in the theory of Radon transform %_ and imply that 
for each fixed (j), the probability density p p (x,§) is the marginal of the Wigner function with respect 
to the direction (j) in the plane, 

Pp(x,ty) = Oi\Wp](x,ty) = / W p (xcos(j) +ysin(j),xsin(j) — ycos$)dy, (13) 

adding quantum homodyne tomography to a number of applications ranging from computerized 
tomography to astronomy and geophysics, Deans (1983). In computerized tomography one recon- 
structs an image of the tissue distribution in a cross-section of the human body by recording events 
whereby pairs of positrons emitted by an injected radioactive substance hit detectors placed in a ring 
around the body after flying in opposite directions along an axis determined by an angle (j) € [0,7i]. 
In quantum homodyne tomography the role of the unknown distribution is played by the Wigner 
function which is in general not positive, but has a probability density /? p (x|(j)) as marginal along 
any direction (j). 

The following diagram summarizes the relations between the various objects in our problem: 



P * 




(XX,*!),..., &,,*„). 



Finally in Table 1 we give some examples of density matrices and their corresponding Wigner 
function representations for different states. The matrix elements p k j are calculated with respect to 
the orthonormal base corresponding to the wave functions of k photons states 

Mf k {x)=H k {x)e- x2 ' 2 (14) 

where H k are the Hermite polynomials normalized such that / \|/ 2 = 1 . A few graphical representa- 
tions can be seen in Figure 3. 



State 


Density matrix p k j 


Wigner function W(q,p) 


Vacuum state 
Single Photon state 
Thermal state p > 
Coherent state, JVel + 
Squeezed state 

NeR+, \ e E, 


pofl — 1 , rest zero 
Pi i = 1, rest zero 
8£(1 - e-V)e-V k 

exp(-A0^ 
C(A^)(±tanh©)*+.'x 


±exp(-q 2 - p 2 ) 
l(2q 2 + 2p 2 -l)cxp(-q 2 -p 2 ) 
i tanh(P/2) exp[- (q 2 + p 2 ) tanh(p/2)] 
±exp(-(q-VN) 2 -p 2 ) 
Iex P (-^(^-a) 2 - e -V) 



Table 1 : Density matrix and Wigner function of some quantum states 



The vacuum is the pure state of zero photons, notice that in this case the distributions of Q and 
P are Gaussian. The thermal state is a mixed state describing equilibrium at temperature T = 1/p, 
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having Gaussian Wigner function with variance increasing with the temperature. The coherent state 
is pure and characterizes the laser pulse. The photon number is Poisson distributed with an average 
of N photons. The squeezed states have Gaussian Wigner functions whose variances for the two 
directions are different but have a fixed product. The parameters N and £ satisfy the condition 

N > sinh 2 ©, C{N£) is a normalization constant, a = ggjgS ' and (^j) 1/2 - 
3. Density matrix estimation 

D'Ariano et al. (1994) presented the density matrix analogue of formula (2) of the Wigner function 
as inverse Radon transform of the probability density p p 

p= f dx [* -±p p (x,$)K(Xt-xl), (15) 

J-oo JO JC 

where K is the generalized function given in equation (3) whose argument is a selfadjoint operator 
— xl. The method has been further analyzed in D'Ariano (1995), Leonhardt et al. (1995), see 
also D'Ariano et al. (1995). We recall that in the case of the Wigner function we needed to regularize 
the kernel K by introducing a cut-off in the integral (3). For density matrices the philosophy will be 
rather to project on a finite dimensional subspace of L2 (M) whose dimension N will play the role of 
the cut-off. In fact all the matrix elements of the density matrix p with respect to the orthonormal 
basis {v|/,t}" =0 defined in (14), can be expressed as kernel integrals 

PkJ = r d X f % p{xA)fkj{x)e -^)\ (i 6) 

J —00 JO K 

with /ft 1 = fjk bounded real valued functions which in the quantum tomography literature are 
called pattern functions. The singularity of the kernel K is reflected in the asymptotic behavior of 
fkj as k,j — > 00. A first formula for f^j was found in Leonhardt et al. (1995) and uses Laguerre 
polynomials. This was followed by a more transparent one due to Leonhardt et al. (1996), 

/*,y(*) = ^(V*(*)<fy(*)). 

for j > k, where and (p ; represent the square integrable and respectively the unbounded solutions 
of the Schrodinger equation, 
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2dx 



\\t = (£)\\l, COGR. (18) 



Figure 2 shows pattern functions for different values of k and j. We notice that the oscillatory part 
is concentrated in an interval centered at zero whose length increase with k and j, the number of 
oscillations increases with k and j and the functions become more irregular as we move away from 
the diagonal. It can be shown that tails of the pattern function decay like x~ 2 ~^ k ~^ . More properties 
of the pattern function can be found in Leonhardt et al. (1996) and Guja (2004). 



3. 1. Pattern function projection estimation 

Equation (16) suggests the unbiased estimator of p, based on n i.i.d. observations of (X,<£>), 
whose matrix elements are: 

pij = l ~i F 'cA^e), (19) 
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q q 
Fig. 2. Pattern functions 



where F kJ (x,§) = fk,j{x)e- i{ J~ k ^ , see D'Ariano et al. (1995), Leonhardt et al. (1995, 1996). By 
the strong law of large numbers the individual matrix elements of this estimator converge to the 
matrix elements of the true parameter p. However the infinite matrix need not be positive, 
normalized, or even selfadjoint, thus it cannot be interpreted as a quantum state. These problems 
are similar to those encountered when trying to estimate an unknown probability density by using 
unbiased estimators for all its Fourier coefficients. The remedy is to estimate only a finite number of 
coefficients at any moment, obtaining a projection estimator onto the subspace generated by linear 
combinations of a finite subset of the basis vectors. In our case we will project onto the space of 
matrices of dimension N = N(n) with respect to the basis {vo}™ = o> 



(A, ' n) --E^.;(^), 



forO<kJ<N-\, 



and p["]=0for(feA j) >N. 

In order to test the performance of our estimators we introduce the L\ and L2 distances on the 
space of density matrices. Let p and x be two density matrices with p — x = A.,P, the diagonal 
form of their difference, and notice that some of the eigenvalues are positive and some negative such 
that their sum is zero due to the normalization of the density matrices. We define the absolute value 
|p — x| := |A,;|P; and the norms 



|p-x||i:=Tr(|p-x|), 
|p-x|| 2 := [Tr(|p-x| 2 )] 1/2 = 



1 1/2 



j,k=0 



(20) 
(21) 



Let us consider now the mean integrated square error (MISE) and split it into the bias and 
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variance parts: 

E(L lpg' n) -P^ 2 )= E IPMl 2 + E(E \pff ) -p k f)=b\n)W{n). (22) 

y>*=0 (k/\j)>N j,k=0 

By choosing 7V(n) — > °° as n — > °° the bias /3 2 («) converges to zero. For the variance we have the 
upper bound 

N-i 



° 2 (") = - E / / \ F k,j{xA)-Pk,j\ 2 Pp( x A)d§dx 
n jj~L Jo jr 

I N-l „7C . 

< - E / / I^jM)! p P (*,<t>)<*<M* 

n jj~LoJo Jr 

= - e' / *f/*j(*) 2 /"%p(*.40#- (23) 

The proof of the following lemma on the norms of the pattern functions can be found in Guja (2004). 
LEMMA 3 . 1 . There exist constants C\ , C2 , C3 such that 

E ll/*jll-<Citf 7/3 , (24) 

*,y>o 

E IIA./II2<C 2 ^V 3 , (25) 

k,j>0 

J p p {x,^)d^<C 3 , forallxeK. (26) 

By applying the lemma to equation (23) we conclude that the estimator p^ ") is consistent with 
respect to the L2 distance if we choose N(n) — > °° as n — > °° such that AT(«) = o(n 3 / 7 ). Based on 
the property Hp^'^H 1 < VN\\p( N,n ^\\2 we can prove a similar result concerning L\ -consistency, see 
Gu|a(2004). 

THEOREM 3.2. Let N(n) °° be the dimension of the pattern function projection estimator. If 
N(n) =o(r?l 1 ), then 

E(||p(^)-p||2)-»0, as»^oo. (27) 

IfN(n) = o(n 3 / 10 ) f/zen 

E (||p(^»)_p|| 1 ) ^0, asn^oc. (28) 

Rates of consistency can be obtained by assuming that the state belongs to a given class for which 
upper bounds of the bias can be calculated and N(n) is chosen such as to balance bias and variance. 
This problem will be attacked in future work within the minimax framework. In Section 5 we 
present a data-dependent way of selecting the dimension of the projection estimator based on the 
minimization of the empirical L2-risk using a cross-validation technique. 
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3.2. Sieve maximum likelihood estimation 

We will consider now a maximum likelihood approach to the estimation of the state p. Let us 
recall the terms of the problem: we are given a sequence Y\,Y2...,Y„ of i.i.d. random variables 
Yj = (X,-, <£>,-) with values in K x [0, Jl] and probability density p p depending on the parameter p 
which is an infinite dimensional density matrix. We would like to find an estimator of p 

pW = pW(y 1 ,y 2 ,...,y„). 
Let p and x be density matrices and denote by 

h(P p ,P x ) :=(^J {^p- p -^H} 2 dxd^j ' , (29) 

the Hellinger distance between the two probability distributions. The following relations are well 
known 

dtAPp.Pz) = ^bp-PxIli <h(P p ,P x ) < tJ\\pp-p,\\i, (30) 

An important property which is true for any measurement is the following inequality between the 
classical and quantum distances, cf. Guja (2004), 

\\Pp-Pz\\i < ||p-x||i. (31) 

From (30) and (3 1) we obtain 

h(Pp,P x ) < V\\P-^\\u (32) 

for arbitrary states p,x. As we have seen previously, the inverse map from P p to p is unbounded 
thus we do not have an inequality in the opposite direction to the one above. However we can prove 
the continuity of the inverse map by using a matrix analogue of the Scheffe's lemma from classi- 
cal probability (see Williams (1991)) stating that if a sequence of probability densities converges 
pointwise almost everywhere to a probability density, then they also converge in total variation 
norm. The matrix Scheffe's lemma which can be found in Simon (1979) says that if p,p^\p^ 2 \ . . . 

are density matrices (positive and trace one) and if the coefficients pj" converge to p, j as n goes 

to infinity, for any fixed indices i,j, then ||pW — P* 1 1 1 — > 0. But by equation (16) if the sequence 

in) 

P p („) converges to P p as n — > °° with respect to the <i tv -distance then p\J converges to p, j and thus 
||pM — p|| i — > 0, completing the proof of the continuity of the map from P p to p. In particular we 
have || p(") — p 1 1 2 — > due to the inequality between the L\ and L2 norms 

||x-p||2<||x-p||i. (33) 
The maximum likelihood estimator of p is defined as 

n 

argmax£log/? x pQ,<Iv) (34) 

T l=\ 

where the maximum is taken over all density matrices in on the space L2OR). However there exist 
density matrices x such that the probability density p z takes arbitrarily high values at all the points 
(Xf ,<Pf ). To see this let us first remind the reader that any density matrix is a convex combination 
of "pure states" which are projections P(\|/) on one dimensional sub-spaces of La(M.) generated by 
vectors \\t which can be written as a Fourier sum 

00 

V|/(x) = £ a k y k {x) (35) 

k=0 
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in the basis {v(/,t} given in equation (14), with / |\|/(x)| 2 c/x = Y,k \ a k\ 2 = L For any such state the 
corresponding probability density is 

P V M) = £a^%(*) =|¥*| 2 M, (36) 

jt=0 

where \^{x) is the square integrable function with Fourier coefficients e'^a^. It is clear that there 
exists a one-to-one relation between \|/ and x\fi which preserves the L2-norms, thus we can choose 
vectors q>i,. . .cp„ such that / |(pj>(x)| 2 iix = 1 and \<pf e (X()\ 2 > C for all £ = 1, . . .n and arbitrary 
C > 0. Then the density matrix 

p = -£p(q>*) (37) 
representing a statistical mixture of the pure states leads to the likelihood 

pp(*.«=n(iii^)i 2 )>(^)"- (38) 

£ 4=1 

which can be arbitrarily high for any fixed n. This drawback can be corrected by using for example 
penalized maximum likelihood estimators or by restricting the state space to some subspace Q, (n) of 
density matrices such that for any amount of data the maximum of the likelihood over Q_(n) exists 
and U„Q,(n) is dense in the space of all density matrices S with respect to some chosen distance 
function. Such a method is called sieve maximum likelihood and we refer to van de Geer (2000) 
and Wong and Shen (1995) for the general theory. The choice of the sieves Q_(n) should be tailored 
according to the problem one wants to solve, the class of states one is interested in, etc. Here we 
will use the same subspaces as for the projection estimator of the previous subsection, that is Q,(n) 
consists of those states with maximally N(n) — 1 photons described by density matrices over the 
subspace spanned by the basis vectors \|/o, . . . ,^N(n)-i- We will call {Q,(")}«>o the number states 
sieves and the dimension N(n) will be an increasing function of n which will be fixed later so as to 
guarantee consistency. 

d(n) = {xes : = for all j > N(n) ork> N(n)} . (39) 

Notice that the dimension of the space Q,(n) is N(n) 2 — 1. Let now the estimator be 

pW := arg max V logp,^,*/), (40) 

where the maximum can be shown to exist for example by using compactness arguments. We will 
denote the corresponding sieve in the space of probability densities by 

¥(n) = {p z : xeQ,(n)}. (41) 

The theory of M-estimators van de Geer (2000) tells us that the consistency of ML estimators 
depends essentially on the "size" of the parameter space, in our case the sieves Q,(n) or 2> («), which 
is measured by entropy numbers with respect to some distance, for example the L\ -norm on density 
matrices or the Hellinger distance between probability distributions. 

DEFINITION 3.3. Let g be a class of density matrices. Let Nb,i(?>, Q) be the smallest value of 
p e N for which there exist pairs of Hermitian matrices (not necessarily density matrices) [x^,T^] 



16 



L.M. Artiles et al. 



with j = 1, . . . ,p such that ||x^ — X^||i < 8 for all j, and such that for each X € § there is a j = 
E {l,...,p} satisfying 

xj<x<xf. 

Then Hba(&, Q ) = logA^l (8, Q ) is called h-entropy with bracketing of Q . 

We note that this definition relies on the concept of positivity of matrices and the existence of the 
L\ -distance between states. But the same notions exist for the space of integrable functions thus 
by replacing density matrices with probability densities and selfadjoint operators with functions 
we obtain the definition of the 8-entropy with bracketing H B \ (8, f ) for some space of probability 
densities f , see van de Geer (2000). Moreover by using equation (31) and the fact that the linear 
extension of the map from density matrices to probability densities sends a positive matrix to a 
positive function, we get that for any 8-bracketing [XpX 1 -} for Q.(«), the corresponding functions 
[ppp'j] form a 8-bracketing for fp(n), i.e. they satisfy \\pj — P^jWi < 8 and for any p e i>(n) there 
exists a j = j(p) such that p L - < p < p v - . Thus 

ffB,i(S,!P(n))<Zfc,i(8,Q,(n)). 

The following proposition gives an upper bound of the "quantum" bracketing entropy and in conse- 
quence for Hb,i (8, 2> (n)). Its proof can be found in Guja (2004) and relies on choosing a maximal 
number of nonintersecting balls centered in Q_(n) having radius 2 N( n ) an< ^ tnen P rov iding a pair of 
brackets for each ball. 

PROPOSITION 3.4. Let Q,(n) be the class of density matrices of dimension N(n). Then 

H B ,i(8, d («) ) <CN(n) 2 log S . (42) 
for some constant C independent ofn and 8. 

By combining the previous inequalities with equation (32) we get the following bound for the brack- 
eting entropy of the class of square-root densities 

iP 1/2 («) = {VP p : Pp (43) 

with respect to the L2-distance 

H B<2 (8,* 1/2 {n)) <C 2 jV(») 2 log^ • (44) 

We will concentrate now on the Hellinger consistency of the sieve maximum likelihood estimator 
P n . We will appeal to a theorem from van de Geer (2000), which is similar to other results in the 
literature on non-parametric M-estimation (see for example Wong and Shen (1995)). There are 
two competing factors which contribute to the convergence of h(P n ,P p ). The first is related to the 
approximation properties of the sieves with respect to the whole parameter space. Such a distance 
from p to the sieve Q,(n) can take different expressions, for example in terms of the % 2 -distance 
between the corresponding probability measures 



X„ :=argmin % 2 (p p ,p„), 



(45) 
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where the % 2 -distance between two probability distributions is given by 



1 2 {PuPi)--- 



7(^-l)2JP 2 p i «p 2 , 



otherwise. 
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(46) 



Notice that X„ depends on n through the growth rate of the sieve N(n). The second factor 
influencing the convergence of h(P n ,Pp) is the size of the sieves which is expressed by the bracketing 
entropy. The non-parametric sieve maximum likelihood estimation theory shows that consistency 
holds if there exists a sequence 8„ — > such that the following entropy integral inequalities are 
satisfied for all n 



J B {h n y' 2 {n)):= / 5 " H l B l 2 {u,<e{n) l l 2 )du V6„<^/c. 



where c is some universal constant, van de Geer (2000). From (44) we get 



J B {h n y 2 {n))=0 



N{n)b„ log 



N(n) 



1/2' 



which implies the following constraint for N(n) — > °° and 8„ — > , 

AT(n) 



5„ 



O 



logn 



(47) 



(48) 



(49) 



THEOREM 3.5. Suppose that the state p satisfies % n — > 0. Lef p'"^ fee f/ie sieve ML£ w;//i Af(«) 
andh n satisfying (47), then 

nh(P n ,Pp) > 5„ + T„) < cexp(-n82/c 2 ). (50) 
Proo/ Details can be found in Guja (2004) based on Theorem 10.13 of van de Geer (2000). 

□ 

From the physical point of view, we are interested in the convergence of the state estimator 
p(") with respect to the L\ and L2-norms on the space of density matrices. Clearly the rates of 
convergence for such estimators are slower than those of their corresponding probability densities. 
As shown in the beginning of this subsection the map sending probability densities p p to density 
matrices p is continuous, thus an estimator p„ taking values in the space of density matrices S is 
consistent in the L\ or L2-norms if and only if P n converges to P p almost surely with respect to the 
Hellinger distance. 

COROLLARY 3.6. The Hellinger consistency of P n is equivalent to the || • ^-consistency of 
p("). In particular, if £„ ex p(— nh 2 /c 2 ) < °° and the assumptions of Theorem 3.5 hold, then we have 
I|p (n) -Plli ->0, a.s.. 



3.3. Wigner function estimation 

The Wigner function plays an important role in quantum optics as an alternative way of representing 
quantum states and calculating an observable's expectation: for any observable X there exists a 
function Wx from R 2 to R such that 



Tr(Xp) =JI W x (q,p)W p (q,p)dqdp. 



(51) 
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Besides, physicists are interested in estimating the Wigner function for the purpose of identifying 
features which can be easier visualized than read off from the density matrix, for example a "non- 
classic" state may be recognized by its patches of negative Wigner function, while "squeezing" is 
manifest through the oval shape of the support of the Wigner function, see Table 1 and Figure 3. 
As described in Subsection 2.3 the Wigner function should be seen formally as a joint density of 
the observables Q and P which may take non-negative values, reflecting the fact that the two ob- 
servables cannot be measured simultaneously. However the Wigner function shares some common 
properties with probability densities, in particular their marginals [W p (q,p)dq and fW p (q,p)dp 
are probability densities on the line. In fact this is true for the marginals in any direction which are 
nothing else then the densities p p (x,§). On the other hand there exist probability densities which 
are not Wigner functions and vice-versa, for example the latter cannot be too "peaked": 

\W p {q,p)\ < -, for all (q,p) G R 2 , p G 5. (52) 
K 

As a corollary of this uniform boundedness we get 

NWp — Well- < — Hp— -elk- (53) 

K 

for any density matrices p and x. Indeed we can write p — x = p+ — p_ where p + and — p_ represent 
the positive and negative part of p — x. Then 

||Wp - Wx||- = \\Wp + - Wp_||oe < \\W p+ \\ + \\W P _\\„ < 

^(llp+lll + llp-lll) = ^llp-tlll- 

Another important property is the fact that the linear span of the Wigner functions is dense in 
L 2 (R 2 ), the space of real valued, square integrable functions on the plane, and there is an isometry 
(up to a constant) between the space of Wigner functions and that of density matrices with respect 
to the L2 -distances 

HWp- Weill =: JJ\W p (q, P )-W T (q,p)\ 2 dpdq=^\\p-x\\ 2 2 . (54) 

In Section 2 we have described the standard estimation method employed in computerized tomog- 
raphy which used a regularized kernel K c with bandwidth h n ~ l/c converging to zero as n — > °° at 
an appropriate rate. This type of estimators for the Wigner function will be analyzed separately in 
future work in the minimax framework along the lines of Cavalier (2000). The estimators which we 
propose in this subsection are of a different type, they are based on estimators for p plugged into the 
following linearity equation 

W p (q,P) =Y,PkjW kJ (q,p), 

where W^j are known functions corresponding to the matrix with the entry (k, j) equal to 1 and all 
the rest equal to zero, see Leonhardt (1997). The isometry (54) implies that the family {Wjtj}jj° - =0 
forms an orthogonal basis of L2 (R 2 ) . Following the same idea as in the previous section we consider 
the projection estimator 

kj=o \ n e=i ) 
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COROLLARY 3.7. LetN{n) be such thatN{n) — » oo and 

N{n)=o(n i l 1 ). 

Then 

E||wW-W p ||2 = ||W p W -W p ||| + 0^^^^. 
Proof: Apply isometry property and Theorem 3.2. 

□ 

Similarly we can extend the SML estimator of the density matrix to the Wigner function. Define 
the subspace 

W(«) = {W p : pea(»)}, 

with Q_(n) as in equation (39), and define the corresponding SML estimator as — W^( n ) where 
pW was defined in (40). 

COROLLARY 3.8. Suppose that p satisfies % n — » 0. Lef W p fee f/ze SML estimator with N(n) 
andh n satisfying (47) anc/£„exp(— nS^/c 2 ) < oo. 77zen we /zave 

ll^-Wplb-^o 

almost surely. Under the same conditions 

\\W {n) -W p \\„^0 

almost surely. 

Proof: Apply the inequalities (53, 54) and Corollary 3.6. 



□ 



4. Noisy observations 

The homodyne tomography measurement as presented up to now does not take into account various 
losses (mode mismatching, failure of detectors) in the detection process which modify the distribu- 
tion of results in a real measurement compared with the idealized case. Fortunately, an analysis of 
such losses (see Leonhardt (1997)) shows that they can be quantified by a single efficiency coeffi- 
cient < r| < 1 and the change in the observations amounts replacing X, by the noisy observations 



Z/:=0TX i + v /(l-Tl)/24« 



(55) 



with \i a sequence of i.i.d. standard Gaussians which are independent of all Xj. The problem is again 
to estimate the parameter p from Y[ = (X- , <£>,-) , for i = 1 , . . . , n. The efficiency-corrected probability 
density is then the convolution 



p(x,4>)exp 

-oo 



l-Tl 



dx. 



(56) 



The physics of the detection process detailed in Leonhardt (1997) offers an alternative route 
from the state to the probability density of the observations Yj. In a first step one performs a 
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Bernoulli transformation B(r\) on the state p which is a quantum equivalent of the convolution with 
noise for probability densities, and obtains a new density matrix pi. To understand the Bernoulli 
transformation let us consider first the diagonal elements {pt — p^.k, k = 0, 1..} and {qj = Pjj, j = 
0, 1..} which are both probability distributions over N and represent the statistics of the number of 
photons in the two states. Let b k k +p — ( k \ p )^ k (i — T|) p be the binomial distribution. Then 

qj = Ztf{i\)Pk (57) 

k=j 

which has a simple interpretation in terms of an "absorption" process by which each photon of the 
state p goes independently through a filter and is allowed to pass with probability r| or absorbed 
with probability 1 — T|. The formula of the Bernoulli transformation for the whole matrix is 

00 r i 1/2 

p?,*=Lk +p (n)^ +p (n) P j+P , k+P - (58) 

The second step is to perform the usual quantum tomography measurement with ideal detectors 
on the "noisy" state pi obtaining the results Yj with density p p (x,$;r[). It is noteworty that the 
transformations B(r|) form a semigroup, that is they can be composed as Z?(r|i)B(r|2) = B(r|ir|2) 
and the inverse of B(r\) is simply obtained by replacing r| with r|~ 1 in equation (58). Notice however 
that if r| < 1/2 the power series (1 — T| _1 )* appearing in the inverse transformation diverges, thus 
we need to take special care in this range of parameters. 

A third way to compute the inverse map from /? p (x,(j);r|) to p is by using pattern functions 
depending on r| which incorporate the deconvolution map from p p (x,§;r\) to p p (x,(j); 1): 

f°° f 71 d<i> 

Pkj::Z J-J X Jo x p p( x >^ fk 'j( x >^- (59) 

Such functions are analyzed in D'Ariano (1995); D' Ariano et al. (1995) where it is argued that the 
method has a fundamental limitation for r| < 1/2 in which case the pattern functions are unbounded, 
while for r| > 1 /2 numerical calculations show that their range grows exponentially fast with both 
indices j,k. 

The two estimation methods considered in Section 3 can be applied to the state estimation with 
noisy observations. The projection estimator has the same form as in Subsection 3.1 with a similar 
analysis of the mean L2-risk taking into account the norms of the new pattern functions fkj{x',T[). 
The sieve maximum likelihood estimator follows the definition in Subsection 3.2 and a consistency 
result can be formulated on the lines of Corollary 3.6. We expect however that the rates of conver- 
gence will be dramatically slower and we will leave this analysis for a separate work. 



5. Experimental results 

In this section we study the performance of the Pattern Function Projection estimator and the Sieve 
Maximum Likelihood estimator using simulated data. In Table 1 we showed some examples of 
density matrices and Wigner functions of quantum states. In Figure 3, we display their correspond- 
ing graphical representation. For some of them the corresponding probability distribution can be 
expressed explicitly and it is possible to simulate data. In particular we shall simulate data from 
QHT measurements on squeezed states with efficiency rj = 1. 
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Fig. 3. Graphical representation of quantum states 
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5.1. Implementation 

In order to implement the two estimators we need to compute the basis functions \|/„ and the func- 
tions (j)„, which are solutions of Schrodinger equation (18). For this, we use an appropriate set of 
recurrent equations, see Leonhardt (1997), Ch. 5. Pattern functions can then be calculated as 

f kJ (x) = 2xyj(x)ty k {x) - v / 2C7+T)Yj+i(x)<M*) - V 2 ( k + l )Vj( x )fa+i(x), 

for all j > k, otherwise f k j(x) = fj, k (x) and then F k j(x,Q) = f k j(x)e'^ k ~^ e . 

On the practical side, finding the maximum of the likelihood function over a set of density 
matrices is a more complicated problem due to the positivity and trace one constraints which must 
be taken into account. A solution was proposed in Banaszek et al. (1999), where the restriction on 
positivity of a density matrix p is satisfied by writing the Cholevski decomposition 

p = T*T (60) 

where T of upper triangular matrices of the same dimension as p with complex coefficients above 
the diagonal and reals on the diagonal. The normalization condition Trp = 1 translates into 1 1 T \ | 2 = 1 
which defines a ball in the space of upper triangular matrices with the L2-distance. We will denote 
by 1 (n) the set of such matrices having dimension N(n). The sieve maximum likelihood estimator 
is the solution of the following optimization problem with N = N(n) 

n 

f = argmax £log/? p (X^,3v) 

Tei(n) e=l 



N-l 



TT (n) (=1 k=l 



k 



argmax £log£ £ T k , mWm (X e )e im ^ 



m=0 



2 



The numerical optimization was performed using a classical descendent method with con- 
straints. Notice that we have an optimization problem on A^ 2 — 1 real variables. Given the problem 
of high dimensionality and computational cost we propose an alternative method to the procedure 
mentioned above. It exploits the mixing properties of our model. Any density matrix of dimen- 
sion N can be written as convex combination of N pure states, i.e. p = Y^=d PrPn where p r > 0, 
Y*rPr — 1 an d Pr is a one dimensional projection whose Cholevski decomposition is of the form 
p r = t*t r where t r is the row vector of dimension on which p r projects, and t* is the column vec- 
tor of the complex conjugate of t r . It should be noted that even though decomposition of our state 
in pure states is not unique this is not a problem given we are actually not interested in this repre- 
sentation but in the resulting convex combination, the state itself. Now we can state the problem as 
to find the maximizer of the loglikelihood 



n N-l n N-l N(n)-l 

L({(X e ,^ e )};p,t) - £ log £ PrPpMe,®e) = £ log £ Pr E t r ^ m {X t )e im ^ 

i=l r=0 1=1 r=0 m=0 

where t r , m represents the m th coordinate of t r . We now maximize over all P G B where 
B = \{p r ,t r }^=o ■ Pr>0, £p r = l,||^|| 2 = l, forallrl. 



2 



We propose an EM algorithm as an alternative method to the one presented in Banaszek et al. 
(1999). See, Dempster et al. (1977) for an exposition on the formulation of the EM algorithm for 
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problems of ML estimation with mixtures of distributions. The iteration procedure is given then in 
the following steps. 

1) Compute the expectation of the conditional likelihood: 

1 n AT(n)-l old old <*> \ 

2) Maximize g(p|p old ) over all p e B and obtain P new with components 

1 " P ° ld P ° M (X e ,<i> e ) 

n 

t*t r , t r = arg max £ /°]l d log/?p r pQ,<I> £ ), 

Il f ll2 =1 £ = 1 

where 

' L%t l pfpff{X e M) 

As initial condition one could take very simple ad hoc states. For example, take to as the vector 
( 1 , 0, . . . , 0) and po = 1 and t r to be the null vector and p r = for r > 0. This corresponds to the one 
photon state. Another possible combination is to take f,.„, = 8f and p r = i. This corresponds to the 
state represented by a diagonal matrix, called a chaotic state. Another strategy is to consider a pre- 
liminary estimator, based on just few observations, diagonalize it and take t r equal to its eigenstates 
and p r the corresponding eigenvalues. In this way one hopes to start the iteration from a state closer 
to the optimum one. In terms of speed our simulations suggests to use the EM version as dimension 
grows. Establishing any objective comparison between direct optimization and EM algorithm has 
proven to be difficult given the dependency on initial conditions, and high dimensionality of the 
problem. 



5.2. Analysis of results 

In Figure 5 we show the result of estimating the squeezed state defined in Table 1 using samples 
of size 1600, for both Pattern Function and Maximum Likelihood estimators. At a first glance 
one can see that the Pattern Function estimator result is rougher when compared to the Maximum 
Likelihood estimator. This is due to the fact discussed in Subsection 3.1 that the variance of F^j 
increases as a function of k and j as we move away from the diagonal. The relation between quality 
of estimation and dimension of the truncated estimator is seen more clearly in Figure 6 where the 
L2-errors of estimating the coherent state is shown for both estimators at different sample sizes. The 
*-dot represents the point of minimum - and thus, optimum dimension jV* (n) - for each curve. The 
curves presented there are the mean L2-error estimated using 15 simulations for each sample size. 
From there we can see that the optimum PFP estimator for the sample of size n = 1600 is the one 
corresponding to N*(n) = 15 while the optimum SML would be obtained using the sieve of size 
N*(n) = 19. 

Let us first analyze the performance of PFP estimator. Notice that for N > N* the mean Li- 
enor increases quadratically with due to contribution from the variance term. As n increases 
the variance decreases like « _1 for a fixed dimension N and consequently, the optimal dimension 
N*{n) increases. One can see that the minimum is attained rather sharply. This suggests that, in 
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order to get a good result a refined method of guessing the optimum dimension becomes necessary, 
eg. BIC, AIC or cross-validation. Figure 4 shows a cross-validation estimator for the Z^-error of the 
PFP estimator for three simulations (continuous lines), each one based on 1600 observations with a 
squeezed state, and for comparison the expected L2-error (dashed line). This represents only a first 
attempt to implement model selection procedures for this problem which should be investigated 
more thoroughly from the theoretical and practical point of view. 

Crossvalidation estimation of L_ 2 -error, sample size n=1600 
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Fig. 4. Estimated optimal dimension for three simulations with 1 600 observations 



We pass now to the SML estimator. In Figure 6 we see that it has smaller L2-error the the PFP 
estimator at its optimum sieve dimension. It is remarkable that the behavior of the L2-error, for 
N > N* has a different behavior in this case, increasing much slower than the PFP estimator at the 
right side of its corresponding optimal dimension. This suggests that SML estimators could have 
a lower risk if the optimum dimension is overestimated. The bottom right pane of Figure 6 shows 
the optimum value of the L2-risk in terms of sample size. Both axis are represented in a logarithmic 
scale. The observed linear pattern indicates that the L2-risk decreases as arC x . The slope of both 
curves correspond to X ~ 0.4, showing an almost parametric rate which is not surprising given the 
smoothness of the example that we consider. The value of x for the PFP estimator is a bit smaller 
than for the SML estimator, confirming its worse performance. Notice also that the constant a is 
bigger for PFP than for the SML estimator. We expect that the contrast between the two estimators 
will be accentuated when r| < 1 . 

Finally, in Figure 7 we show the result of estimating the Wigner function of the squeezed state 
using both estimators. As explained in Section 3.3 the corresponding estimator can be obtained by 
plugging-in the density matrix estimator in equation 4. The density matrix estimators for the same 
state are represented in Figure 5. 

6. Concluding remarks 

In this paper we have proposed a Pattern Function Projection estimator and a Sieve Maximum 
Likelihood estimator for the density matrix of the quantum state and its Wigner function. We proved 
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Fig. 5. Estimation of Squeezed state. First column, from top to bottom, PFP and SML estimators. 
Second column, corresponding errors 



they are consistent for different norms in their corresponding spaces. There are many open statistical 
questions related to quantum tomography and we would like to enumerate a few of them here. 

• Cross-validation. For both types of estimators, a data dependent method is needed for select- 
ing the optimal sieve dimension. We mention criteria such as unbiased cross-validation, hard 
thresholding or other types of minimum contrast estimators Barron et al. (1999). 

• Efficiency < r| < 1. A realistic detector has detection efficiency < r| < 1 which introduces 
an additional noise in the homodyne data. From the statistical point of view we deal with a 
Gaussian deconvolution problem on top of the usual quantum tomography estimation. 

• Rates of convergence. Going beyond consistency requires the selection of classes of states 
which are natural both from the physical, as well as statistical point of view. One should 
study optimal and achieved rates of convergence for given classes. For < r| < 1 the rates 
are expected to be significantly lower than in the ideal case, so it becomes even more crucial 
to use optimal estimators. In applications, sometimes only the estimation of a functional of 
p such as average number of photons or entropy may be needed. This will require a separate 
analysis, cf. Shen (2001). 
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log(n) 

Fig. 6. L2 error for Pattern Function Estimator and Sieve Maximum Likelihood Estimator and different 
sample sizes: n = 100, . . . , 12800. Last graphic represents the optimum L 2 error for different sample 
size, using a logarithmic scale on both axis. 



• Kernel estimators for Wigner function. When estimating the Wigner function it seems more 
natural to use a kernel estimator such as in Cavalier (2000) and to combine this analysis with 
the deconvolution problem in the case noisy observations T| < 1, Butucea (2004). 

• Other quantum estimation problems. The methods used here for quantum tomography can 
be applied in other problems of quantum estimation, such as for example the calibration of 
measurement devices or the estimation of transformation of states under the action of quantum 
mechanical devices. 
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